{
 "nbformat": 4,
 "nbformat_minor": 0,
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  },
  "colab": {
   "name": "Cifar100_bench_amp.ipynb",
   "provenance": []
  }
 },
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "tSC1tFE5fDHM"
   },
   "source": [
    "# Benchmark mixed precision training on Cifar100\n",
    "\n",
    "In this notebook we will benchmark 1) native PyTorch mixed precision module [`torch.cuda.amp`](https://pytorch.org/docs/master/amp.html) and 2) NVidia/Apex package.\n",
    "\n",
    "We will train Wide-ResNet model on Cifar100 dataset using Turing enabled GPU and compare training times.\n",
    "\n",
    "**TL;DR**\n",
    "\n",
    "The ranking is the following:\n",
    "- 1st place: Nvidia/Apex \"O2\"\n",
    "- 2nd place: `torch.cuda.amp`: autocast and scaler\n",
    "- 3rd place: Nvidia/Apex \"O1\"\n",
    "- 4th place: fp32\n",
    "\n",
    "According to @mcarilli: \"Native amp is more like a faster, better integrated, locally enabled O1\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "VZJDqc7vfDHV"
   },
   "source": [
    "## Installations and setup\n",
    "\n",
    "1) Recently added [`torch.cuda.amp`](https://pytorch.org/docs/master/notes/amp_examples.html#working-with-multiple-models-losses-and-optimizers) module to perform automatic mixed precision training instead of using Nvidia/Apex package is available in PyTorch >=1.6.0.\n",
    "\n",
    "In this example we only need `pynvml` and `fire` packages, assuming that `torch` and `ignite` are already installed. We can install it using pip:"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "SkRXPuNRfDHX"
   },
   "source": [
    "!pip install pytorch-ignite pynvml fire"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "9SPEV91DfDHZ"
   },
   "source": [
    "2) Let's install Nvidia/Apex package:"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "fGtxgbj8fDHb"
   },
   "source": [
    "# Install Apex:\n",
    "# If torch cuda version and nvcc version match:\n",
    "!pip install --upgrade --no-cache-dir --global-option=\"--cpp_ext\" --global-option=\"--cuda_ext\" git+https://github.com/NVIDIA/apex/\n",
    "# if above command is failing, please install apex without c++/cuda extensions:\n",
    "# !pip install --upgrade --no-cache-dir git+https://github.com/NVIDIA/apex/"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "QnihHXQpfDHb"
   },
   "source": [
    "import torch\n",
    "import torchvision\n",
    "import ignite\n",
    "torch.__version__, torchvision.__version__, ignite.__version__"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "TooCbccqfDHg"
   },
   "source": [
    "3) The scripts we will execute are located in `ignite/examples/contrib/cifar100_amp_benchmark` of github repository. Let's clone the repository and setup PYTHONPATH to execute benchmark scripts:"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "6xqqj0q1fDHh"
   },
   "source": [
    "!git clone https://github.com/pytorch/ignite.git /tmp/ignite\n",
    "scriptspath=\"/tmp/ignite/examples/contrib/cifar100_amp_benchmark/\"\n",
    "setup=f\"cd {scriptspath} && export PYTHONPATH=$PWD:$PYTHONPATH\""
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "gX30i_abfDHi"
   },
   "source": [
    "4) Download dataset"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "ulufk4tsfDHj"
   },
   "source": [
    "from torchvision.datasets.cifar import CIFAR100\n",
    "CIFAR100(root=\"/tmp/cifar100/\", train=True, download=True)"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "ai6nBHZKfDHl"
   },
   "source": [
    "## Training in fp32"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "mHwsVTB6fDHq"
   },
   "source": [
    "!{setup} && python benchmark_fp32.py /tmp/cifar100/ --batch_size=256 --max_epochs=20"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "n2p-EMwGfDHs"
   },
   "source": [
    "## Training with `torch.cuda.amp`"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "xkuW1EY-fDHs"
   },
   "source": [
    "!{setup} && python benchmark_torch_cuda_amp.py /tmp/cifar100/ --batch_size=256 --max_epochs=20"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "2qjtyKnOfDHt"
   },
   "source": [
    "## Training with `Nvidia/apex`\n",
    "\n",
    "\n",
    "- we check 2 optimization levels: \"O1\" and \"O2\"\n",
    "    - \"O1\" optimization level: automatic casts arount Pytorch functions and tensor methods\n",
    "    - \"O2\" optimization level: fp16 training with fp32 batchnorm and fp32 master weights"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "A6Pe4cW6fDHu"
   },
   "source": [
    "!{setup} && python benchmark_nvidia_apex.py /tmp/cifar100/ --batch_size=256 --max_epochs=20 --opt=\"O1\""
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "1aqdlPSgfDHu"
   },
   "source": [
    "!{setup} && python benchmark_nvidia_apex.py /tmp/cifar100/ --batch_size=256 --max_epochs=20 --opt=\"O2\""
   ],
   "execution_count": null,
   "outputs": []
  }
 ]
}
